A Large-scale Batch-learning Self-organizing Map for Function Prediction of Poorly-characterized Proteins Progressively Accumulating in Sequence Databases
نویسندگان
چکیده
Homology searches for nucleotide and amino-acid sequences have been used widely to predict functions of genes and proteins when genomes are decoded and thus become a basic bioinformatics tool. Whereas usefulness of the sequence homology search is apparent, it has become increasingly clear that homology search can predict the protein function of only 50% of genes, or fewer, when a novel genome is decoded. As a result of decoding of extensive genome sequences from a wide variety of phylotypes, a large number of proteins whose function cannot be predicted by the homology search of amino acid sequences is progressively accumulated and thus remains of no use in science and industry. A method to estimate the protein function that does not depend on the sequence homology search is in urgent need. We previously developed a Batch-Learning SOM (BL-SOM) for genome informatics, which does not depend on the order of data input. This report focuses on BL-SOM analyses on di to tri continuous amino acid frequencies. Concerning the diand tripeptide frequencies in the 110,000 proteins which have been classified into 2,853 function-known COGs (clusters of orthologous groups of proteins to represent individual functional categories), BL-SOMs that faithfully reproduced the COG classifications were obtained. This indicated that proteins, whose functions are presently unknown because of lack of significant homology with function-known proteins, can be related to function-known proteins with the BL-SOM.
منابع مشابه
A Large-scale Batch-learning Self-organizing Map for Function Prediction of Poorly-characterized Proteins Progressively Accumulating in Sequence Databases : Annual Report of the Earth Simulator Center April 2007 - March 2008
As a result of decoding of extensive genome sequences, a large number of proteins whose function cannot be predicted by the homology search of amino acid sequences is progressively accumulated and thus remains of no use in science and industry. A method to predict the protein function that does not depend on the sequence homology search is in urgent need. We previously developed a Batch-Learnin...
متن کاملThe Time Adaptive Self Organizing Map for Distribution Estimation
The feature map represented by the set of weight vectors of the basic SOM (Self-Organizing Map) provides a good approximation to the input space from which the sample vectors come. But the timedecreasing learning rate and neighborhood function of the basic SOM algorithm reduce its capability to adapt weights for a varied environment. In dealing with non-stationary input distributions and changi...
متن کاملA Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)
With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which all...
متن کاملHow to make large self-organizing maps for nonvectorial data
The self-organizing map (SOM) represents an open set of input samples by a topologically organized, finite set of models. In this paper, a new version of the SOM is used for the clustering, organization, and visualization of a large database of symbol sequences (viz. protein sequences). This method combines two principles: the batch computing version of the SOM, and computation of the generaliz...
متن کاملNGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map
Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007